Abstract: Computer Forensics analysis
is defined as the discipline that combines elements of law and computer science
which used to analysis the seized computers in Forensics department. Clustering algorithms are typically used
for exploratory data analysis, where there is little or no prior knowledge
about the data. This is exclusively in a number of applications of Computer
Forensics, including the one addressed in our work. In exacting, algorithms for
clustering documents can make possible the innovation and functional knowledge
from the documents under analysis. To be had an approach that applies document
clustering algorithms to forensic analysis of computers seized in police
investigations. It can be moving out
with six familiar clustering algorithms (K-means,K-medoids, Single Link, Complete Link, Average Link, and
CSPA) applied to five real-world datasets
obtained from computers seized in real-world investigations. Automatically labeling document clusters with words which identify their
topics is difficult to do well. In order to solve this problem we present two
methods of labeling document clusters provoked by the
model that words are generated by a hierarchy of mixture components of varying
generality. The first method assumes existence of a document hierarchy
(manually constructed or resulting from a hierarchical clustering algorithm)
and uses a chi squared test of consequence to detect different word usage
across categories in the hierarchy. The second method selects words which equally
occur frequently in a cluster and effectively differentiate the given cluster
from the other clusters. We compare
these methods on abstracts of documents selected from a subset of the hierarchy
of the Cora search engine for computer science research papers. Labels produced
by our methods showed superior results to the commonly employed methods.
Keywords:
Data mining, Forensic Analysis, Clustering, Fuzzy c-means,
EM Algorithm